Towards a Reference Corpus for Automatic Toponym Resolution Evaluation
نویسنده
چکیده
Spatial named entities ground events in space, and this relationship is essential for advanced text processing applications such as question answering and event tracking. Toponym resolution is the task of mapping from an entity to a spatial representation (an extensional coordinate model), given the context. Whereas work on the temporal dimension is ongoing [17], to date no reference corpus exists to evaluate competing algorithms for toponym resolution. This paper argues that a shareable evaluation resource is necessary, and presents a proposal for the markup and the process of annotating the corpus. We present TRML, an XML-based markup language, and TAME, the Toponym Annotation Markup Editor, which are both part of a tool-chain developed as part of an ongoing corpus curation effort to address this issue.
منابع مشابه
Creating a Novel Geolocation Corpus from Historical Texts
This paper describes the process of annotating a historical US civil war corpus with geographic reference. Reference annotations are given at two different textual scales: individual place names and documents. This is the first published corpus of its kind in document-level geolocation, and it has over 10,000 disambiguated toponyms, double the amount of any prior toponym corpus. We outline many...
متن کاملToponym Resolution: A First Large-Scale Comparative Evaluation
Toponym resolution (TR) is the task of mapping the name of a location to a spatial representation of the location referred to, such as the centroid of the location, given as latitude/longitude. While a number of systems for automating the task have been described in the literature, to date no comparative evaluation study has existed, mainly for lack of a standard benchmark (i.e., gazetteer and ...
متن کاملToponym recognition in custom-made map titles
The titles of customized topographic maps constitute a specific corpus which is characterized by a very significant number of place names and spelling variations. This paper is about identifying toponyms in these titles. The toponym tracking is based on gazetteers as well as light parsing according to patterns. The method used broadens the definition of the toponym to include the nature of the ...
متن کاملResolving fine granularity toponyms: Evaluation of a disambiguation approach
Landscape descriptions in natural language, for instance from historic corpora, are a complementary source to empirical ethnographic work, for example to research exploring variation in the use of basic levels or basic terms within landscapes across localities (c.f. Mark and Turk 2003, Burenhult and Levinson 2008, Turk et al. 2011), on the condition that such descriptions can be linked to space...
متن کاملToponym Resolution in Text: “Which Sheffield is it?”
Named entity tagging comprises the sub-tasks of identifying a text span and classifying it, but this view ignores the relationship between the entities and the world. Spatial and temporal entities ground events in space-time, and this relationship is vital for applications such as question answering and event tracking. There is much recent work regarding the temporal dimension [13, 10], but no ...
متن کامل